Search CORE

66 research outputs found

A compact aVLSI conductance-based silicon neuron

Author: Hamilton Tara Julia
Tapson Jonathan
Thakur Chetan Singh
van Schaik Andre
Wang Runchun
Publication venue
Publication date: 01/01/2015
Field of study

We present an analogue Very Large Scale Integration (aVLSI) implementation that uses first-order lowpass filters to implement a conductance-based silicon neuron for high-speed neuromorphic systems. The aVLSI neuron consists of a soma (cell body) and a single synapse, which is capable of linearly summing both the excitatory and inhibitory postsynaptic potentials (EPSP and IPSP) generated by the spikes arriving from different sources. Rather than biasing the silicon neuron with different parameters for different spiking patterns, as is typically done, we provide digital control signals, generated by an FPGA, to the silicon neuron to obtain different spiking behaviours. The proposed neuron is only ~26.5 um2 in the IBM 130nm process and thus can be integrated at very high density. Circuit simulations show that this neuron can emulate different spiking behaviours observed in biological neurons.Comment: BioCAS-201

arXiv.org e-Print Archive

Crossref

Western Sydney ResearchDirect

A Reconfigurable Mixed-signal Implementation of a Neuromorphic ADC

Author: Hamilton Tara Julia
Tapson Jonathan
Thakur Chetan Singh
van Schaik Andre
Wang Runchun
Xu Ying
Publication venue
Publication date: 01/01/2015
Field of study

We present a neuromorphic Analogue-to-Digital Converter (ADC), which uses integrate-and-fire (I&F) neurons as the encoders of the analogue signal, with modulated inhibitions to decohere the neuronal spikes trains. The architecture consists of an analogue chip and a control module. The analogue chip comprises two scan chains and a twodimensional integrate-and-fire neuronal array. Individual neurons are accessed via the chains one by one without any encoder decoder or arbiter. The control module is implemented on an FPGA (Field Programmable Gate Array), which sends scan enable signals to the scan chains and controls the inhibition for individual neurons. Since the control module is implemented on an FPGA, it can be easily reconfigured. Additionally, we propose a pulse width modulation methodology for the lateral inhibition, which makes use of different pulse widths indicating different strengths of inhibition for each individual neuron to decohere neuronal spikes. Software simulations in this paper tested the robustness of the proposed ADC architecture to fixed random noise. A circuit simulation using ten neurons shows the performance and the feasibility of the architecture.Comment: BioCAS-201

arXiv.org e-Print Archive

Crossref

Western Sydney ResearchDirect

Single-bit-per-weight deep convolutional neural networks without batch-normalization layers for embedded systems

Author: McDonnell Mark D.
Mostafa Hesham
van Schaik Andre
Wang Runchun
Publication venue
Publication date: 01/01/2019
Field of study

Batch-normalization (BN) layers are thought to be an integrally important layer type in today's state-of-the-art deep convolutional neural networks for computer vision tasks such as classification and detection. However, BN layers introduce complexity and computational overheads that are highly undesirable for training and/or inference on low-power custom hardware implementations of real-time embedded vision systems such as UAVs, robots and Internet of Things (IoT) devices. They are also problematic when batch sizes need to be very small during training, and innovations such as residual connections introduced more recently than BN layers could potentially have lessened their impact. In this paper we aim to quantify the benefits BN layers offer in image classification networks, in comparison with alternative choices. In particular, we study networks that use shifted-ReLU layers instead of BN layers. We found, following experiments with wide residual networks applied to the ImageNet, CIFAR 10 and CIFAR 100 image classification datasets, that BN layers do not consistently offer a significant advantage. We found that the accuracy margin offered by BN layers depends on the data set, the network size, and the bit-depth of weights. We conclude that in situations where BN layers are undesirable due to speed, memory or complexity costs, that using shifted-ReLU layers instead should be considered; we found they can offer advantages in all these areas, and often do not impose a significant accuracy cost.Comment: 8 pages, published IEEE conference pape

arXiv.org e-Print Archive

Crossref

Adelaide Research & Scholarship

Western Sydney ResearchDirect

Neuromorphic implementations of polychronous spiking neural networks

Author: Wang Runchun
Publication venue: 'American Psychological Association (APA)'
Publication date: 01/01/2013
Field of study

The object of this thesis is to investigate polychronous spiking neural networks using neuromorphic implementations. This type of neural network has enormous memory capacity as it can store far more spatio-temporal patterns than it has neurons, which could help to explain how the human cortex can have such a diversity of behaviour with a mere 1011 neurons. To date, most of the published polychronous spiking neural networks have been implemented using software neuron models and such simulations are not capable of achieving emulation of large-scale neural networks in real time. We therefore present a mixed-signal implementation of a reconfigurable, polychronous spiking neural network with a vast capacity for storing spatio-temporal patterns. The work presented in this thesis includes the design of a polychronous spiking neural network using a novel delay-adaptation algorithm, an FPGA implementation of the proposed neural network, an analogue implementation of the proposed neural network, and their integration into a mixed-signal platform. Rather than using a weight-adaptation algorithm such as Spike Timing Dependent Plasticity (STDP) to prune and select appropriate subsets of delays during training, the proposed neural network generates delay paths de novo, so that only connections that actually appear in the training patterns will be created. This allows the proposed network to use all the axons (variables) to store information. Spike Timing Dependent Delay Plasticity (STDDP) is proposed to fine-tune the delays of the axons and add dynamics to the network. The FPGA implementation uses a time-multiplexing approach allowing us to achieve 4096 (4k) neurons and up to 1.15 million programmable delay axons. The analogue implementation comprises 50 neurons and 400 axons. Compared to the digital implementation, the analogue implementation is more biological plausible as the computation in biological neurons is conducted with analogue variables. An analogue memory with a novel structure and a very low leakage rate was designed and characterised for the analogue axon. In the mixed-signal platform, 4K analogue neurons were achieved by using a time-multiplexing approach. Testing results show that the mixed-signal implementation of the proposed neural network is capable of successfully recalling up to 96% of the stored patterns. The results also show that the neural network is robust to noise and problems such as device mismatch and process variation

Western Sydney ResearchDirect

Breaking Liebig’s Law: An Advanced Multipurpose Neuromorphic Engine

Author: André van Schaik
Runchun Wang
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

We present a massively-parallel scalable multi-purpose neuromorphic engine. All existing neuromorphic hardware systems suffer from Liebig’s law (that the performance of the system is limited by the component in shortest supply) as they have fixed numbers of dedicated neurons and synapses for specific types of plasticity. For any application, it is always the availability of one of these components that limits the size of the model, leaving the others unused. To overcome this problem, our engine adopts a unique novel architecture: an array of identical components, each of which can be configured as a leaky-integrate-and-fire (LIF) neuron, a learning-synapse, or an axon with trainable delay. Spike timing dependent plasticity (STDP) and spike timing dependent delay plasticity (STDDP) are the two supported learning rules. All the parameters are stored in the SRAMs such that runtime reconfiguration is supported. As a proof of concept, we have implemented a prototype system with 16 neural engines, each of which consists of 32768 (32k) components, yielding half a million components, on an entry level FPGA (Altera Cyclone V). We verified the prototype system with measurement results. To demonstrate that our neuromorphic engine is a high performance and scalable digital design, we implemented it using TSMC 28nm HPC technology. Place and route results using Cadence Innovus with a clock frequency of 2.5 GHz show that this engine achieves an excellent area efficiency of 1.68 μm2 per component: 256k (218) components in a silicon area of 650 μm × 680 μm (∼0.44 mm2, the utilization of the silicon area is 98.7%). The power consumption of this engine is 37 mW, yielding a power efficiency of 0.92 pJ per synaptic operation (SOP)

Directory of Open Access Journals

Frontiers - Publisher Connector

Western Sydney ResearchDirect

An FPGA-Based Massively Parallel Neuromorphic Cortex Simulator

Author: Thakur Chetan S
van Schaik Andre
Wang Runchun M
Publication venue: FRONTIERS MEDIA SA, AVENUE DU TRIBUNAL FEDERAL 34, LAUSANNE, CH-1015, SWITZERLAND
Publication date: 01/01/2018
Field of study

This paper presents a massively parallel and scalable neuromorphic cortex simulator designed for simulating large and structurally connected spiking neural networks, such as complex models of various areas of the cortex. The main novelty of this work is the abstraction of a neuromorphic architecture into clusters represented by minicolumns and hypercolumns, analogously to the fundamental structural units observed in neurobiology. Without this approach, simulating large-scale fully connected networks needs prohibitively large memory to store look-up tables for point-to-point connections. Instead, we use a novel architecture, based on the structural connectivity in the neocortex, such that all the required parameters and connections can be stored in on-chip memory. The cortex simulator can be easily reconfigured for simulating different neural networks without any change in hardware structure by programming the memory. A hierarchical communication scheme allows one neuron to have a fan-out of up to 200k neurons. As a proof-of-concept, an implementation on one Altera Stratix V FPGA was able to simulate 20 million to 2.6 billion leaky-integrate-and-fire (LIF) neurons in real time. We verified the system by emulating a simplified auditory cortex (with 100 million neurons). This cortex simulator achieved a low power dissipation of 1.62 mu W per neuron. With the advent of commercially available FPGA boards, our system offers an accessible and scalable tool for the design, real-time simulation, and analysis of large-scale spiking neural networks

arXiv.org e-Print Archive

Directory of Open Access Journals

Frontiers - Publisher Connector

Open Access Repository of IISc Research Publications

Western Sydney ResearchDirect

Reduced-memory training and deployment of deep residual networks by stochastic binary quantization

Author: McDonnell Mark D.
Schaik Andre van (R16638)
Wang Runchun (R17805)
Publication venue: U.S., Semiconductor Research Corporation
Publication date: 01/01/2017
Field of study

Motivated by the goal of enabling energy-efficient and/or lower-cost hardware implementations of deep neural networks, we describe a method for modifying the standard backpropagation algorithm that significantly reduces the memory usage during training by up to a factor of 32 compared with standard single-precision floating point implementations. The method is inspired by recent work on feedback alignment in the context of seeking neurobiological correlates of backpropagationbased learning; similar to feedback alignment, we also calculate gradients imprecisely. Specifically, our method introduces stochastic binarization of hidden-unit activations for use in the backward pass, after they are no longer used in the forward pass. We show that without stochastic binarization the method is far less effective. As verification of the effectiveness of the method, we trained wide residual networks with 20 weight layers on the CIFAR-10 and CIFAR-100 image classification benchmarks, achieving error rates of 5.43%, 23.01% respectively. These error rates compare with 4.53% and 20.51% on the same network trained without stochastic binarization. Moreover, we also investigated learning binary-weights in deep residual networks and demonstrate, for the first time, that networks using binary weights at test time can perform equally to full-precision networks on CIFAR-10, with both achieving 4.5% error rate using a wide residual network with 20 layers of weights. On CIFAR-100, binary-weights at test time had an error of 22.28%, within 2% of the full-precision case

Western Sydney ResearchDirect